Augmenting short-term cepstral features with long-term discriminative features for speaker verification of telephone data
نویسندگان
چکیده
Short-term cepstral features have long been chosen as standard features for speaker recognition thanks to their relevance and effectiveness. In contrast, discriminative features, calculated by a multi-layer perceptron (MLP) from much longer stretches of time, have been gradually adopted in automatic speech recognition (ASR). It has been shown that augmenting short-term cepstral features with long-term MLP (multi-layer perceptron) features makes it possible to improve significantly the performance of ASR. In this work, we investigate the possibility of augmenting short-term cepstral features with MLP features in order to improve the performance of text-independent speaker verification. We show, that, even though augmenting cepstral features with MLP features does not directly improve speaker verification performance, reducing the dimension of the augmented features, using principal component analysis (PCA), makes it possible to reduce, relatively, around 12% of the equal error rate (EER). Experiments are performed on telephone data of the 2008 NIST SRE (speaker recognition evaluation) database.
منابع مشابه
Combining short-term cepstral and long-term pitch features for automatic recognition of speaker age
The most successful systems in previous comparative studies on speaker age recognition used short-term cepstral features modeled with Gaussian Mixture Models (GMMs) or applied multiple phone recognizers trained with the data of speakers of the respective class. Acoustic analyses, however, indicate that certain features such as pitch extracted from a longer span of speech correlate clearly with ...
متن کاملRobustness to telephone handset distortion in speaker recognition by discriminative feature design
A method is described for designing speaker recognition features that are robust to telephone handset distortion. The approach transforms features such as mel-cepstral features, log spectrum, and prosody-based features with a non-linear arti®cial neural network. The neural network is discriminatively trained to maximize speaker recognition performance speci®cally in the setting of telephone han...
متن کاملShort- and Long-Term Speech Features for Hybrid HMM-i-Vector based Speaker Diarization System
i-vectors have been successfully applied over the last years in speaker recognition tasks. This work aims at assessing the suitability of i-vector modeling within the frame of speaker diarization task. In such context, a weighted cosine-distance between two different sets of i-vectors is proposed for speaker clustering. Speech clusters generated by Viterbi segmentation are first modeled by two ...
متن کاملLocal spectral variability features for speaker verification
Speaker verification techniques neglect the short-time variation in the feature space even though it contains speaker related attributes. We propose a simple method to capture and characterize this spectral variations through the eigenstructure of the sample covariance matrix. This covariance is computed using sliding window over spectral features. The newly formulated feature vectors represent...
متن کاملMFCC and Prosodic Feature Extraction Techniques:
In this paper our main aim to provide the difference between cepstral and non-cepstral feature extraction techniques. Here we try to cover-up most of the comparative features of Mel Frequency Cepstral Coefficient and prosodic features. In speaker recognition, there are two type of techniques are available for feature extraction: Short-term features i.e. Mel Frequency Cepstral Coefficient (MFCC)...
متن کامل